Poliqarp: An open source corpus indexer and search engine with syntactic extensions
نویسندگان
چکیده
This paper presents recent extensions to Poliqarp, an open source tool for indexing and searching morphosyntactically annotated corpora, which turn it into a tool for indexing and searching certain kinds of treebanks, complementary to existing treebank search engines. In particular, the paper discusses the motivation for such a new tool, the extended query syntax of Poliqarp and implementation and efficiency issues.
منابع مشابه
On Heads and Coordination in Valence Acquisition
The aim of this paper is to present the design of a partial syntactic annotation of the IPI PAN Corpus of Polish [22] and the corresponding extension of the corpus search engine Poliqarp [25,12] developed at the Institue of Computer Science PAS and currently employed in Polish and Portuguese corpora projects. In particular, we will argue for the need to distinguish between, and represent both, ...
متن کاملA Framework for Bridging the Gap Between Open Source Search Tools
Building a search engine that can scale to billions of documents while satisfying the needs of the users presents serious challenges. Few successful stories have been reported so far [36]. Here, we report our experience in building YouSeer, a complete open source search engine tool that includes both an open source crawler and an open source indexer. Our approach takes other open source compone...
متن کاملA Search Tool for Corpora with Positional Tagsets and Ambiguities
This article describes POLIQARP, a corpus indexing and query tool, which understands positional tagsets and which does not assume that word forms are annotated with unique morphosyntactic tags. POLIQARP is designed to be applicable to a variety of languages and tagsets: it works with XML-encoded texts, uses the UTF-8 character set, and allows for an external specification of the tagset. Current...
متن کاملDynamic Load Balancing Model: Preliminary Assessment of a Biological Model for a Pseudo-search Engine
Emulation of the current World Wide Web (WWW) search engines using methodologies derived from Genetic Programming (GP) and Knowledge Discovery in Databases (KDD) were used for the PseudoSearch Engine's initial parallel implementation of an indexer simulator. The indexer was implemented to follow some of the characteristics currently implemented by AltaVista and Inktomi search engines who index ...
متن کاملMapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages
In this paper, the authors discuss the MapReduce implementation of crawler, indexer and ranking algorithms in search engines. The proposed algorithms are used in search engines to retrieve results from the World Wide Web. A crawler and an indexer in a MapReduce environment are used to improve the speed of crawling and indexing. The proposed ranking algorithm is an iterative method that makes us...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007